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Summary of Review 

Class Size: What Research Says and What It Means for State Policy argues that increasing 
average class size by one student will save about 2% of total education spending with negligible 
impact on academic achievement. It justifies this conclusion on the basis that Class- Size 
Reduction (CSR) is not particularly effective and is not as cost-effective as other reforms. 
However, this conclusion is based on a misleading review of the CSR research literature. The 
report puts too much emphasis on studies that are of poor quahty or that do not focus on 
settings that are particularly relevant to the debate on class-size policy in the United States. It 
argues that dass-size reduction is less cost-effective than other reform polides, but it bases this 
contention on an incomplete accounting of the benefits of smaller dasses and an uncritical, 
unexamined fist of alternative pohdes. The report’s estimates of the potential cost savings are 
flawed as, in reality, schools cannot structurally reduce dass size by only one student. Well- 
documented and long-term non- academic gains fir)m CSR are not addressed. Likewise, the 
recommendation for releasing the “least effective” teachers assumes a valid way of making such 
determinations is available. 
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I. Introduction 

Class Size: What Research Says and What It Means for State Policy, authored by Grover J . 
Whitehurst and Matthew M. Chingos and published by the Brown Center on Education Policy at 
the Brookings Institution, estimates the potential cost savings of increasing average pupil- 
teacher ratio by one student to be $ 12 billion per year. ^ To assess the likely learning impacts on 
students of such an increase, the study reviews a subset of research evidence on class size. 

II. Findings and Conclusions of the Report 

The report argues that during the current budget crisis, increasing class size can save substantial 
amounts of money vtith relatively few detrimental effects on students. Furthermore, it contends 
that if dass-size increases were coupled with the laying off of the least- effective instructors, any 
potential negative impacts from larger dasses could be offset by the resulting better instruction. 
The report condudes by adding that dass-size reduction polides are most effective vdien they 
are targeted toward groups that benefit most, and vdien they are carefully weighed against 
alternate policy options. 

III. The Report’s Rationale for Its Findings and Conclusions 

To assess the likely impacts on students of an increase in dass size, the study reviews some 
research evidence on dass-size reduction. In its literature review, the report concentrates on 
only those studies the authors deem to be “high enough quality,” and they summarize the range 
of impad estimates across these studies. The report characterizes the positive results from the 
well-known and hi^y regarded Tennessee STAR Project— vdiidi used an experimental design— 
as unusually large compared to the rest of the literature. The report also points to some prior 
studies that find no impact of dass size on student achievement, suggesting to the authors that 
there may be little or no negative impad on students of moderate increases in class size. 

IV. The Report’s Use of Research Literature 

The report’s primary contribution is in summarizing what it deems to be high-quality studies on 
the impad of dass-size reduction. The report (correctly, in my opinion) emphasizes that only 
studies that have employed careful research methodologies are appropriate to consider. Because 
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piiblic policy is primarily interested in vdiether a policy can change outcomes— and not simply 
whether there is a non- causal, correlational relationship— it is vital to place primary emphasis 
on studies that can isolate the causal impact of dass size. The report argues that the highest- 
quahty studies are based on randomized experiments, natural experiments (also known as 
“quasi-experiments”), or studies based on sophisticated mathematical modeling. Unfortunately, 
as seen below, it does not uniformly apply these standards to its literature review, and as a result 
the report presents misleading results. 

V. Review of the Report’s Methods 

The study is primarily based on a review of the literature. As noted, the report reviews several 
studies that the authors consider “high quality,” and this approach is sensible. However, the 
report’s execution of this approach paints a picture of the class- size hterature that is less 
condusive than it in fad is. A recent review coauthored by MT professor J oshua Angrist 
charaderizes the dass- size hterature as being in substantial agreement about the magnitude of 
the benefits of smaller dass sizes.^ He argues that the hi^-quality studies “have produced 
estimates within a remarkably narrow band” and that “the weight of the evidence suggests that 
dass-size reductions generate modest achievement gains” on the order of a 0.2 to 0.3 standard 
deviation increase for a 10- student reduction in dass size. The Brookings report comes to a 
different condusion in part because it places too much emphasis on some mixed or negative 
studies that do not actually measure up to the authors’ expressed high standards. In addition, 
the STAR results are mischaracterized as too large and as out of fine vdth the literature. As 
described below, the STAR study and its results deserve substantially stronger weight than that 
given in the Brookings report. 

Study reviews 

The study notes that “the most influential and credible study of CSR is the Student Teacher 
Achievement Ratio” (STAR). In STAR, students and teachers in 79 Tennessee elementary 
schools were randomly assigned to small or regular-sized classes from 1985 to 1989. Such 
randomized experiments are generally characterized as the gold standard of social science 
research. This is because any positive gains in outcomes can be attributed with great confidence 
to being assigned to a smaller dass. There have been many secondary studies using STAR data, 
and those studies have consistently found positive impacts not only on test scores but also on 
later fife outcomes such as criminal behavior and college enrollment.^ The findings indicate that 
redudng dass size from an average of 22 to an average of 15 improves average math and reading 
scores by about 0.20 standard deviations. Low- income students and African American students 
experience somevdiat larger improvements from being assigned to a smaller dass. 

The Brookings report mischaraderizes the STAR effect sizes as unusually large relative to the 
hterature. As the only estimates that come from a large-scale, weU- executed true random 
experiment, though, these are the most trustworthy estimates in the hterature. Thus, the STAR 
results are the benchmark; at this point in time. Research studies with less ideal research designs 
compare the magnitude of their findings to those from this randomized experiment. 
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Another highly regarded dass-size study described in the Brookings r^ort used quasi- 
experimental methods and comes fix)m schools in Israel Israel has a maximum dass-size rule 
of 40 students. As a result, if there are only 40 students in a given grade at a given school, they 
can be in one single large dass. If a 41st student enrolls, however, a second dass must be added, 
pushing average dass size across the two dasses down to 20 . 5. Leveraging this sharp change in 
dass size, the authors of the study find strong positive impacts of smaller class sizes, on the 
order of a 0.026 standard deviation improvement in test scores for each student reduced in a 
dass. This implies that a 7- student reduction in class- size increases test scores by 0.18 standard 
deviations. Interestingly, the estimated impact in this study is very dose to the one formd in the 
STAR experiment. 

The report omits results from Wisconsin’s Student Achievement Guarantee in Education 
(SAGE) program, vdiich reduced pupil- teacher ratios in hi^- poverty schools from between 21: 1 
and 25: 1 to between 12: 1 and 15: 1. Molnar and coauthors evaluated the impad of the program by 

The rest of the papers described in the study fail to meet the quality 
threshold set out by the authors. 

comparing test score growth rates in SAGE schools to those in comparison schools that are 
closely matched in terms of students’ demographic characteristics and prior test scores. ^ The 
results are highly consistent with the results firom STAR and the Israeli study. Overall, small 
dass attendance improved student achievement by approximately 0.2 standard deviations. Also 
consistent with STAR, Aftlcan American students received larger benefits firom attending 
smaller dasses. 

In addition, the report dtes a well- regarded Cormecticut study that found no positive effect of 
reduced dass size.® This study took two approaches. The first relied upon the relatively modest 
variation in enrollment and dass sizes that are driven by random fluctuations in cohort sizes 
across adjacent years and formd no impad of smaller dasses. The second took the same type of 
approach as the Israeli study described above and formd positive but statistically insignificant 
impacts of dass-size reduction. 

The rest of the papers described in the study fail to meet the quahty threshold set out by the 
authors. That is, in order to quahfy as a hi^- quahty study, the study must be based on variation 
in class size that is random or nearly random. An example of a problematic study would be one 
that simply correlates dass size to achievement outcomes without regard to vdiy some students 
are assigned to larger vs. smaher dasses. For instance, sometimes low- achieving or spedal 
needs students are systematically assigned to smaher dasses vdiere they can receive more 
individualized attention. Likewise, advanced placement classes are often quite smah. More 
affluent districts frequently have smaher class sizes than impoverished districts. In natural 
settings, there are any number of other variables that are not induded in these analyses but 
vdiidi could have a strong impact on achievement. 

For example, one of the studies showing positive impacts of dass-size reduction uses a large 
dataset from Texas that allows the researchers to follow individual students over time.^ The 
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paper overall is a highly regarded study published in a top economics journal. However, the 
paper’s primary objective is to measure the impact of teachers on their students, not its 
subsection on the impact of dass size, vdiere the research approach was not well suited to 
address this question. (In the study, the dass- size effect is measured by the difference in dass 
size that a student experiences from one year to the next after accounting for stable and 
unchanging charaderistics of each individual student.) 

The problem with the Texas study’s approach is that it only addresses one possible confounding 
variable. There may be important determinants of year-to-year variation in dass size that are 
unobserved by the researchers but have a direct impact on student achievement. For instance, if 
a prindpal observes that a student under- performed relative to her potential in a given year, the 
principal may take that into account in the following year’s class assignments. If— as we would 
naturally expect— any of the underlying factors that cause the assignment to smaller classes are 
also corrdated with academic achievement, then the dass- size effect will be mismeasured. 

Another confounded study, an unpublished working paper by one of the Brookings report’s 
authors, examines a recent pohcy change in Florida and is dted as evidence against CSR as a 
reform strategy. ^ The study purports to investigate the impad of dass- size reduction, but in fact 

In its choice of studies to cite, the Brookings report puts too much faith in 
weaker studies and, thus, improperly represents what we do know about 
the effects of CSR. 

the policy change is much more complicated. The pohcy awarded additional resources to all 
schools. Schools with large classes were required to use the resources to reduce dass size, vdiile 
schools that already had smah dasses were also given a sizeable finandal bonus that they could 
use in any manner. In addition, as documented in a recent Think Tank Review, there were other 
policy changes in Florida that were implemented at the same time, such as mandatory grade 
retention.9 This type of study— even if executed perfectly— would not ahow one to isolate or draw 
condusions about the impact of dass- size reduction.^ 

The Brookings report dtes aj epsen and Rivkin study, of California dass- size reduction, that 
finds positive impacts of dass- size reduction, but smaher in magnitude than those found in 
Tennessee and Israd.^^ Unfortunately, the California pohcy was introduced in a manner that 
made it impossible to credibly measure the impact of the policy. Test scores were not measured 
prior to the introduction of the policy, nor were they measured in the primary grades (vdiere the 
reduction occurred) . It should also be noted that an additional, thoughtful study of the 
California policy by Bohmstedt and Stecher was not dted in the Brookings report, but argues 
that the data hmitations are so severe that drawing any condusions about the impad on student 
acdiievement is not justified.^ Although J epsen and Rivkin make a valiant effort to isolate the 
impad of the pohcy, in the end the results are confounded by dass- size reductions in earher, 
untested grades. 
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The Brookings study also indudes a pair of papers that investigate dass size in eighth grade and 
find mixed results.^ Although each of these middle school papers has strengths and weaknesses, 
ultimately they are less relevant in the current policy context that is focused on early grades. It is 
worth noting, though, that the Brookings study omits a Danish paper that finds dass- size effects 
for ninth-grade students that are of the same magnitude as those found in STAR, Israel and 
SAGE.14 

To be sure, the number of strong studies on dass-size reduction is small. However, in its choice 
of studies to dte, the Brookings report puts too much faith in weaker studies and, thus, 
improperly represents vdiat we do know about the effects of CSR. 

Cost-benefit analysis 

The Brookings report estimates that approximately 2% of total K- 12 public education spending— 
$ 12 bilhon per year— can be saved by increasing average dass size across the United States by a 
sin^e student. While the Brookings report does extrapolate costs down to the effect of one 
student, it does not directly address instructional losses. It only opines, “But if schools choose 
the least effective teachers to let go, then the effect of increased teacher quahty could make up 
for some or all of the possible negative impact of increasing class size.” 

A key practical problem with the cost- savings calculation, though, is that dasses and teachers 
are not so easily divisible. Indeed, this is vdiy research designs such as those based on the 
maximum dass-size rule in Israel are so strong: vdien a school sees an enrollment increase from 
40 to 41, it has to hire an entire additional teacher and cannot instead hire a fraction of one. As a 
result, average dass-size dedines sharply vdien a second teacher is hired (fixrm 40 to 41/ 2, or 
20.5). 

The same concept apphes in reverse vdien one considers increasing dass size. Imagine a K- 5 
school that has 100 students per grade spread across four dassrooms in each grade. Currently, 
each of the 24 classrooms has a class size of 25 students. If the district were to reduce the 
teaching force in this school by one, the new average pupil- teacher ratio would be 26— 
characterized in the report as a relatively small increase in class size for potentially large 
budgetary savings. To obtain this, thou^, one cannot increase each dass size by a sin^e 
student. Each grade still has 100 students, and unless the school/ district engages in some 
creative multi- grade dassrooms or redistricting, most grades will stiU have 4 teachers and a 
dass size of 25. To save one teacher, though, one grade would have to be reduced from 4 
dassrooms to 3, raising the average dass size in that grade from 25 to 33.3. Among the diildren 
in this grade, the negative impacts would be striking. The school’s decision about which grade- 
level should be disadvantaged in this manner would certainly be complicated. 

Unaddressed in the Brookings cost analysis is a growing body of research on early childhood 
interventions that shows that CSR may have a significant and positive impact on long-term 
outcomes, even vdien short-term test scores show more modest effects or rapid fadeout. 
Research on the Perry Preschool Project in particular suggests lower adult criminal behavior of 
partidpants, and a recent study of Head Start shows important gains in a variety of outcomes 
measured in young adulthood, such as hi^ school graduation rates and health status.^ 
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Evidence from follow-up studies of the STAR participants suggests that there are long-term 
benefits on a variety of outcomes that would not have been predicted fix>m test score increases. 
If these potential benefits are ignored, then a simple cost- benefit analysis based on short-term 
and narrovdy measured gains is inadequate. 

In addition, the claims of comparative cost- benefit analyses fail to acknovdedge the uncertainty 
surrounding such estimates. 

VI. Review of the Findings and Conclusions 

The report gives a misleading summary of the literature. It includes studies that show mixed or 
negative results that are of insufficient quality, and studies that investigate older students who 
are less hkely to be subject to low- class- size pohcy initiatives. 

The report also mischaracterizes the STAR findings as imusually large relative to other studies. 
For example, comparing STAR to the Israeli study, the report says that the Israeli results are “on 
the lower end of the range of those found in the STAR study.” In contrast J oshua Angrist, one of 
the authors of the Israeh study, states in his econometrics textbook that the impacts he finds are 
nearly identical to those in STAR.^® 

Further, the report presents misleading statistics on the potential cost savings of a one- student 
increase in class size. Because one carmot dismiss a fraction of a teacher but instead must 
dismiss a vdrole one, a one- student increase in average pupil- teacher ratio would hkely result in 
very large dass-size increases for some students. 

Finahy, after saying, “there is no research from the U.S. that directly compares CSR to specific 
alternative investments,” the report states that CSR is the “least cost-effective” policy based on a 
study that only accounts for a subset of the benefits of the pohcy. In making this daim, the 
report provides no evidence in support of potential alternative polides or vdrether any or ah of 
those alternate polides could feasibly be implemented on a large scale. 



VII. Usefulness of the Report for Guidance of Policy and Practice 

The Brookings report is of hmited use in pohcy debates about the role of dass size on student 
acdiievement or as an effective guidehne for finandal savings. It provides a misleading 
charaderization of the prior research hterature, and it implies that dass-size increases will have 
httle impad on students. It also bases much of its argument on the impractical and cjuestionable 
assumption that any reduction in the teacher workforce can be made on the basis of 
instructional cjuahty instead of according to the terms of teachers’ current contracts. It does, 
however, make the important point that dass-size reduction may be more effective for 
(hsadvantaged students and young students— and consecjnently that potential increases in dass 
size would be particularly detrimental to these groups. Overah, the authors fail to make their 
case that increasing dass size is either relatively harmless to school cjuality or a cost-effective 
way of saving money. 
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